Skip to content

Introduce the LSH compiler#753

Open
lhecker wants to merge 11 commits intomainfrom
dev/lhecker/syntax-highlighting-compiler
Open

Introduce the LSH compiler#753
lhecker wants to merge 11 commits intomainfrom
dev/lhecker/syntax-highlighting-compiler

Conversation

@lhecker
Copy link
Member

@lhecker lhecker commented Jan 27, 2026

This PR contains no CLI frontend, etc., for the compiler,
as I split out everything but the compiler to reduce the PR size.

Part of #624

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14/19 plus high water mark of 171 on generator.rs; still ongoing

JumpIfMatchPrefixInsensitive { idx: u32, tgt: u32 },

// Flushes the current HighlightKind to the output.
FlushHighlight { kind: Register },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(why is this its own instruction instead of a register? i guess having a special register would mean you need to check writes to every register to see if it was the special output register. however, you already have that with pc...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An instruction felt like the right thing to do, because it's an "action". pc is also its own "action" after all.

In the future I'll probably change the flush instruction significantly. I found it to be a poor fit for the needed flexibility. An instruction that states "highlight range x1 to x2 in color x3" would be slower, but also allow for highlighting regex capture groups.

//! | `[a-z]?` | `Charset{cs, min=0, max=1}` - optional char |
//! | `$` | `EndOfLine` condition |
//! | `.*` | `MovImm off, MAX` - skip to end of line |
//! | `\>` | `If Charset(\w) then FAIL else MATCH` - word boundary |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like a vim-ism; everyone else uses \b for word boundary... right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\b only matches boundaries between a non-word and word character. This is different from \>, which doesn't care about the preceding character. It simply checks if the next character is a non-word character while not consuming it (look-ahead).

I forgot where I took \> from. It's not unique to Vim I believe, but I'm sure I saw it there as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, Rust also has it as "\b{end}, \> end-of-word boundary assertion". This is technically a "\b{end-half} half of a end-of-word boundary assertion" but that's too verbose so whatever.

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed everything, just need a couple comment changes and a quick question answered and I think I'm ready to sign off!

@lhecker lhecker enabled auto-merge (squash) March 19, 2026 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants